Hierarchical Multilabel Classification Trees for Gene Function Prediction (Extended Abstract)

نویسندگان

  • Hendrik Blockeel
  • Leander Schietgat
  • Jan Struyf
  • Amanda Clare
  • Sašo Džeroski
چکیده

Prediction of gene function is a so-called hierarchical multilabel classification (HMC) task: a single instance can be labelled with multiple classes rather than just one (i.e., a gene can have multiple functions), and these classes are organized in a hierarchy. Many machine learning methods focus on learning predictive models with a single target variable. One can then learn to predict all classes separately and combine the predictions afterwards. An alternative is to upgrade these methods towards the HMC context. In this paper we explore this alternative for classification trees. A comparison of learning HMC trees with learning normal classification trees shows that the former has clear advantages with respect to accuracy, efficiency, and interpretability. It seems worth investigating to what extent these results carry over to other machine learning methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing multilabel classification methods for provisional biopharmaceutics class prediction.

The biopharmaceutical classification system (BCS) is now well established and utilized for the development and biowaivers of immediate oral dosage forms. The prediction of BCS class can be carried out using multilabel classification. Unlike single label classification, multilabel classification methods predict more than one class label at the same time. This paper compares two multilabel method...

متن کامل

Hierarchical Multilabel Protein Function Prediction Using Local Neural Networks

Protein function predictions are usually treated as classification problems where each function is regarded as a class label. However, different from conventional classification problems, they have some specificities that make the classification task more complex. First, the problem classes (protein functions) are usually hierarchically structured, with superclasses and subclasses. Second, prot...

متن کامل

Structured Prediction by Conditional Risk Minimization

We propose a general approach for supervised learning with structured output spaces, such as combinatorial and polyhedral sets, that is based on minimizing estimated conditional risk functions. Given a loss function defined over pairs of output labels, we first estimate the conditional risk function by solving a (possibly infinite) collection of regularized least squares problems. A prediction ...

متن کامل

Weighted True Path Rule: a multilabel hierarchical algorithm for gene function prediction

The genome-wide hierarchical classification of gene functions, using biomolecular data from high-throughput biotechnologies, is one of the central topics in bioinformatics and functional genomics. In this paper we present a multilabel hierarchical algorithm inspired by the “true path rule” that governs both the Gene Ontology and the Functional Catalogue (FunCat). In particular we propose an enh...

متن کامل

An Experimental Comparison of Hierarchical Bayes and True Path Rule Ensembles for Protein Function Prediction

The computational genome-wide annotation of gene functions requires the prediction of hierarchically structured functional classes and can be formalized as a multiclass, multilabel, multipath hierarchical classification problem, characterized by very unbalanced classes. We recently proposed two hierarchical protein function prediction methods: the Hierarchical Bayes (hbayes) and True Path Rule ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006